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STATISTICAL  MODELS  FOR  TIME  SERIES  AND  LIFE  TESTING  WITH  APPLICATIONS 

IN  ENGINEERING  SYSTEMS 


OVERVIEW  OF  RESEARCH 

Over  the  period  of  this  grant  5 papers  have  been  accepted  for  publication, 

1 paper  has  been  published,  5 technical  reports  have  been  written,  and  4 theses 
have  been  completed.  Over  the  period  1 April  1972  to  June,  1978  35  papers  have 
been  published,  9 papers  have  been  accepted  for  publication,  41  technical  reports 
have  been  written,  one  book  has  been  written  and  another  revised,  all  of  these 
have  received  support  from  the  present  and  preceeding  Air  Force  grants.  Many  of  the 
publications  have  reflected  our  continuing  interest  in  the  topics  of  stochastic 
systems  (especially  with  discrete  control)  in  model  building,  smoothing  and 
curve  estimation,  methods  of  approximation  with  noisy  data,  inferences  with 
censored  data,  and  life  testing  and  reliability  with  applications  to  systems. 


Part  1 - Time  Series  Models 

Research  in  this  contract  period  has  led  to  a deeper  understanding  of 
stochastic  models  containing  deterministic  components,  and  of  the  relationship 
between  methods  of  forecasting  using  ARMA  models  and  generalized  exponential 
smoothing.  Also  the  question  of  optimal  sampling  periods  for  efficient  sample 
data  feedback  control  has  received  attention. 


Deterministic  or  Stochastic  Models? 


Models  of  the  form 

yt  = f(t)  + et 

in  which  f ( t)  represents  a deterministic  function  of  time  and  a 

random  error  have  often  been  misused  to  represent  series  whose  development 
could  much  more  plausibly  be  represented  by  a stochastic  model  such  as  an 
autoregressive-moving  average  ARMA  process. 

There  will,  however,  be  situations  in  which  a deterministic  component 
actually  exists.  Our  research  shows  that  appropriate  analysis  of  a fitted 
ARMA  process  can  point  to  the  necessity  for  a deterministic  component. 

A paper  describing  these  findings  has  recently  been  accepted  for 
publication  in  "Applied  Statistics. "[1] 

When  are  exponential  smoothing  methods  optimal? 

Generalized  exponential  smoothing  forecasting  procedures  are  used 
extensively  in  many  areas  of  economics,  business  and  engineering.  Our  research 
shows  that: 

1)  These  forecasting  procedures  are  optimal  in  terms  of  achieving 
minimum  mean  square  error  forecasts  only  if  the  underlying 
stochastic  process  is  Included  in  a limited  subclass  of  ARIMA 
(p,d,q)  processes.  Hence,  it  is  shown  what  assumptions  are 
made  when  using  these  procedures. 

11)  The  implication  of  point  (i)  is  that  the  users  of  these  procedures 
tacitly  assume  that  the  stochastic  processes  which  occur  In  the  real 
world  are  from  a very  restricted  subclass  of  stochastic  processes. 

No  reason  can  be  found  why  these  particular- nbdels  should  occur 
more  frequently  than  others. 


ill)  It  is  further  shown  that  even  if  a stochastic  process  which  would  lead 
to  such  a procedure  did  occur  the  actual  methods  used  for  making 
the  forecasts  are  clumsy.  Much  simpler  methods  are  available. 

A paper  describing  these  findings  was  recently  published  in  "Metrika."[2] 

Sampling  Interval  and  Feedback  Control 

One  question  which  often  arises  in  sample  data  feedback  systems  is 
how  frequently  should  one  sample.  In  an  optimal  control  scheme,  we  suppose  that 
a control  arrangement  will  be  employed  in  which  the  sampling  interval  is  h 
units  long  where  h is  an  unknown  integer.  The  equations  describing  both  the 
dynamics  and  the  noise  process  now  depend  on  this  interval  and  a reasonable  cost 
function  may  be  formulated  as  a function  of  h.  The  optimum  time  interval  at  which 
surveillance  should  be  conducted,  i.e.  observations  taken  and  control  adjustment 
action  taken,  is  obtained  by  minimizing  the  above  cost  function. 

A paper  describing  these  findings  has  been  accepted  for  publication  in 
"Technometr i cs . " [3] 

Spline  Smoothing  for  the  Non-Parametric  Recovery  of  Curves 
Consider  the  model 

Y(ti ) * g ( ti ) + ci , 1 - ti  e T 

where  e * (e^,...#cn)‘  N(0.°2Inxn)  and  g ( - ) is  some  "smooth"  function  defined 

on  some  index  set  T.  When  T is  an  interval  of  the  real  line,  cubic  polynomial 
smoothing  splines  are  well  known  to  provide  an  esthetically  satisfying  method 
for  estimating  g(-),  from  a realization  y = (y] . — ,yn) * of  Y = (Y(t1),...,Y(tn))\ 
Splines  are  an  appealing  alternative  to  fitting  a specified  set  of  m regression 
functions,  for  example  polynomials  of  degree  less  than  m,  when  one  is  uncertain 
that  the  true  curve  g ( • ) is  actually  in  the  span  of  the  specified  regression 
functions.  We  show  that  polynomial  spline  (respectively  generalized  spline) 
smoothing  is  equivalent  to  Bayesian  estimation  with  a prior  on  g which  is 


"diffuse"  on  the  coefficients  of  the  polynomials  of  degree  <m  (respectively 
specified  set  of  m regression  functions),  and  "proper"  over  an  appropriate 
set  of  random  variables  not  including  the  coefficients  of  the  regression 
functions.  Since  Gauss  Markov  estimation  is  equivalent  to  Bayesian  estimation 
with  a prior  diffuse  over  the  coefficients  of  the  regression  functions,  this 
result  leads  to  the  conclusion  that  spline  smoothing  is  a (the?)  natural 
extension  of  Gauss-Markov  regression  with  m specified  regression  functions.  It 
is  shown  that  spline  smoothing  is  an  appropriate  solution  to  the  problem 
arising  when  one  wants  to  fit  a given  set  of  regression  functions  to  the  data 
but  one  also  wants  to  "hedge"  against  model  errors,  that  is,  against  the 
possibility  that  the  true  model  g is  not  exactly  in  the  span  of  the  given  set 
of  regression  functions.  We  show  that  the  spline  smoothing  approach  leads  to 
a natural  measure  of  the  deviation  of  the  true  g from  the  span  of  the  regression 
functions,  and  furthermore,  a good  value  of  this  measure  can  be  estimated  from 
the  data.  The  estimated  value  of  the  measure  is  then  used  to  control  the 
deviation  of  the  estimated  g. 

A paper  describing  these  findings  has  recently  been  accepted  for 
publication  in  the  Journal  of  the  Royal  Statistical  Society,  Ser.  B,  [6]. 

Goodness  of  Fit  Tests 

Some  basic  work  on  the  theory  of  k-spacings  in  qoodness-of-fit  tests 
has  been  completed.  The  goodness-of-fit  problem  is  the  problem  of  testing 
whether  a set  of  independent  observations  X],X2,...,Xn  with  unknown  cumulative 
distribution  function  F actually  came  from  a population  with  a specified 
distribution  Fq.  If  (loosely)  X^,X^2\...  ,X^  are  the  ordered  observations, 
then  S1  * X(2)-X^\  S2  - X*3,-X(2* ,. . . ,Sf)_1  * x^n,-Xtn_1^  are  the  spacings. 

The  k-spacings  are  defined  for  n+1  = Nk,  by  Some 

classical  goodness-of-fit  tests  are  based  on  statistics  of  the  form 


and  tests  based  on  k-spacings  are  of  the  form 


= J VV  ’ 

where  tp  and  tN  are,  typically,  convex  functions.  A general  theory  has  been 
developed  showing  that  tests  based  on  k spacings,  for  suitable  k,  have  better 
properties  than  tests  based  on  ordinary  spacings.  The  asymptotic  distribution 
theory  of  TN  under  alternatives  near  to  FQ  has  been  obtained.  [4].  Some 
substantial  contributions  have  been  made  to  the  important  problem  of  the 
behavior  of  k-spacinqs  goodness-of-fit  tests  when  some  parameters  in  Ffl  must 
be  estimated.  [5] 


This  work  has  been  submitted  for  publication. 
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Part  2 - Life  Testing  and  Reliability. 


The  stress-strength  model  for  the  reliability  of  a single  component 
assumes  that  a component,  having  random  strength  Y,  is  placed  u.der  a random 
stress  X by  its  operating  environment.  In  [1]  we  first  extend  this  model 
to  an  s out  of  k system  having  identical  components.  It  is  assumed  that 
the  distribution  of  Y is  unknown  and  the  distribution  of  X can  be  either 
known  or  unknown.  Under  the  assumption  that  the  unknown  c.d.f.'s  are  continuous, 
we  obtain  the  UMVU  estimator  of  reliability  and  another  estimator  based  on 
empirical  distributions.  Then,  employing  some  notions  of  weak  convergence, 
we  derive  their  limiting  distributions  and  establish  asymptotic  equivalence. 

According  to  the  weakest  link  theory  for  the  behavior  of  materials  or 
systems,  life  length  (or  strength)  will  have  a distribution  that  is  closely 
approximated  by  one  of  the  three  types  of  extreme  value  distributions.  By 

considering  the  class  of  transformations  (1  + (x~a)/b))\  -«°  < X < 0 

and  0 < X < °°,  we  find  in  [2]  that  the  problem  of  finding  the  best  fitting 
extreme  value  distribution  is  equivalent  to  finding  a,-  b,  and  X that  transform 
the  observations  to  a standard  negative  exponential.  The  power  X selected 
by  our  procedure  then  identifies  the  appropriate  extreme  value  distribution. 

Our  approach  of  finding  the  posterior  distribution  of  X also  ascertains 
the  relative  fit  of  the  best  fitting  models  selected  from  each  of  the  three  dif- 
ferent types.  The  large  sample  behavior  of  the  technique  is  studied  and  many 
examples  are  given. 

Asymptotic  normality  and  efficiency  of  the  modified  least  squares 
estimators  (MLSE)  are  studied  in  [3,4]  in  the  context  of  some  accelerated 
life  test  models.  A general  parametric  family  of  life  distributions,  involving 
the  scale  and  shape  parameters,  is  considered  where  the  logarithm  of  the  scale 
parameter  is  assumed  to  be  linearly  related  to  the  stress  variables.  Many  of 


the  widely  used  engineering  models  (Arrhenius,  Eyring,  inverse  power  law, 
generalized  Eyring,  etc. 1 are  special  cases  of  this  formulation.  Aside  from 
a rigorous  treatment  of  the  limiting  normality  of  the  MLSE  and  the  maximum 
likelihood  estimators,  the  asymptotic  efficiencies  of  the  MLSE  are  derived 
both  under  complete  samples  and  type  II  censored  samples.  Numerical  evalua- 
tions are  performed  for  the  exponential,  Weibull  and  gamma  distributions. 

A semi-Markov  process  is  formulated  in  [5]  for  a purchasing  model 
where  the  inverse  Gaussian  distribution  is  used  for  the  interpurchase  times. 

To  account  for  population  heterogeneity,  a natural  conjugate  prior  for  the 
model  parameters  is  developed  and  the  compounding  distribution  is  fitted  to 

i 

panel  data.  A number  of  important  summary  measures  are  derived.  These  include 
the  market  share  of  a specified  brand  as  a function  of  time,  the  long  run 
behavior  of  the  interval  transition  probabilities  and  the  market  shar.;,  and  the 
probability  distribution  of  the  number  of  purchases  in  a given  time  span. 

Semi-Markov  process  models  for  wearout  have  proven  valuable  in  the 
determination  of  maintainance  and  replacement  schedules  for  many  complex 
hardware  systems.  In  his  thesis  [6],  M.  Akritas  studies  inference  procedures 
for  continuous  time  stochastic  processes.  Optimal  statistical  procedures  are 
derived  for  situations  where  the  series  is  observed  from  (0„t)  and  t -*•  °°. 

The  results  obtained  are  quite  general.  Markov  processes  and  Semi-Markov 
processes  are  treated  in  detail.  An  outgrowth  of  this  thesis  work  is  the 
related  work  [7]  giving  general  conditions  for  contiguity. 

In  epidemiological  investigations  it  is  often  necessary  to  estimate 
the  infection  rate  in  the  population  of  disease  transmitting  agents.  Because 
the  number  of  specimens  in  the  sample  is  frequently  high,  it  is  practically 
Impossible  to  assay  each  specimen  individually.  Instead,  the  specimens  are 
randomly  divided  into  a number  of  "pools"  and  each  pool  is  tested  as  a unit. 


In  [8]  we  derive  the  confidence  interval  for  the  infection  rate  when  the 
data  come  from  pool  testing.  We  also  derive  point  estimates  for  the  situati 
where  the  finite  sample  itself  is  considered  to  be  the  population  under 
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